Linguistic Issues in Language Technology LiLT

نویسنده

  • Masood Ghayoomi
چکیده

In this paper, we describe an ongoing research to develop an HPSGbased treebank for Persian. To this aim, we use a bootstrapping approach for the data annotation. In the rst step, a set of seed rules are de ned as regular expressions in the CLaRK system. Then, the data is shallow processed with this set of rules. In the next step, a human annotator completes the annotation of sentences manually. To increase automatic annotation, we extract the manual applied rules and iteratively augment the seed rules with the rules applied frequently in the manual annotation. Our experiment in building the Persian treebank which currently contains 1000 sentences shows that the proposed method reduces human intervention from 74.05% in rst iterations to 39.01% in last iterations. 1 LiLT Volume 7, Issue 19, January 2012. Bootstrapping the Development of an HPSG-based Treebank for Persian. Copyright c © 2012, CSLI Publications. 2 / LiLT volume 7, issue 19 January 2012

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Issues in Language Technology LiLT

In this paper, we overview the ways in which computational methods can serve the goals of analysis and theory development in linguistics, and encourage the reader to become involved in the emerging cyberinfrastructure for linguistics. We survey examples from diverse subfields of how computational methods are already being used, describe the current state of the art in cyberinfrastructure for li...

متن کامل

Linguistic Issues in Language Technology – LiLT

Lakoff (1974) argues that affective demonstratives in English are markers of solidarity, with exclamative overtones deriving from their close association with evaluative predication. Focusing on this, we seek to inform these claims using quantitative corpus evidence. Our experiments suggest that affectivity is not limited to specific uses of this, but rather that it arises in a wide range of li...

متن کامل

Linguistic Issues in Language Technology – LiLT

Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superficial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We first describe an annotation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011